Strain/species identification in metagenomes using genome-specific markers
نویسندگان
چکیده
Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.
منابع مشابه
Identification of bovine, ovine and caprine pure and binary mixtures of raw and heat processed meats using species specific size markers targeting mitochondrial genome
A specific polymerase chain reaction (PCR) method was applied for identification of bovine (Bos taurus), ovine (Ovis aries) and caprine (Capra hircus) pure and binary mixtures of raw and heat-processed meats. These meats are used in food industry products and/or for direct consumption of consumers. The mitochondrial DNA was amplified as a template in a PCR reaction by use of specific primers re...
متن کاملUniversity of Oklahoma Graduate College Metagenomic Insights into Microbial Community Responses to Long-term Elevated Co2 a Dissertation
......................................................................................................................... xvi Chapter 1: Introduction ..................................................................................................... 1 1.1 Atmospheric CO2: the background ....................................................................... 1 1.2 Effects of elevated atmospher...
متن کاملSearching the genome of beluga(Husohuso) for sex markers based on targeted Bulked SegregantAnalysis (BSA)
In sturgeon aquaculture, where the main purpose is caviar production, a reliable method is needed to separate fish according to gender. Currently, due to the lack of external sexual dimorphism, the fish are sexed by an invasive surgical examination of the gonads. Development of a non-invasive procedure for sexing fish based on genetic markers is of special interest. In the present study we empl...
متن کاملAFLP reveals no sex-specific markers in Persian sturgeon (Acipenser persicus) or beluga sturgeon (Huso huso) from the southern Caspian Sea, Iran
The late sexual maturity in sturgeon and the absence of morphological differences between males and females makes sex discrimination difficult. Identification of sex at an early life stage is of high interest in caviar production because it allows efficient selection of females. In this study, the genome of 10 mature male and 10 mature female specimens of Persian sturgeon (Acipenser persicus...
متن کاملAn integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography.
We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 42 شماره
صفحات -
تاریخ انتشار 2014